We are designing a dynamic analysis system for a database containing clinical, genomic, and proteomic data from 200 patients with Myotonic Dystrophy Type 1 (DM1), collected across 25 hospitals and entered by 13 different personnel.
Myotonic Dystrophy Type 1 is the most common adult-onset muscular dystrophy. It is a multi-systemic disorder: beyond progressive muscle weakness, patients experience cardiac conduction defects, endocrine abnormalities, cataracts, and cognitive impairment. A well-designed analytical system must therefore support exploration across multiple clinical domains and detect subtle longitudinal changes that unfold over months to years.
We use the publicly available random.cdisc.data
package from the pharmaverse
ecosystem, which generates synthetic CDISC ADaM datasets that mirror the
structure of a real multi-centre clinical trial.
All dependencies are declared in the project’s Nix flake, so by using
nix develop every package below is available without manual
installation.
## All packages are provided by the Nix flake — no manual installation needed.
## If you run this report outside `nix develop`, uncomment the block below:
#
# install.packages(
# c("tidyverse", "survival", "random.cdisc.data",
# "gridExtra", "broom", "scales",
# "plotly", "htmltools", "webshot2"),
# repos = "https://cloud.r-project.org"
# )
library(tidyverse)
library(survival)
library(random.cdisc.data)
library(gridExtra)
library(broom)
library(scales)
# Detect output format: interactive plotly for HTML, static ggplot for markdown
is_html <- knitr::is_html_output()
if (is_html) {
library(plotly)
library(htmltools)
}
set.seed(42)
# To render this document
# Rscript -e 'rmarkdown::render("small_exploratory_analysis/exploratory_analysis.Rmd")'
The full catalogue of cached datasets can be enumerated programmatically. This is useful when onboarding new team members or scripting automated pipelines that must know what data is available:
# List every cached dataset shipped with random.cdisc.data
available <- data(package = "random.cdisc.data")$results
knitr::kable(data.frame(
Dataset = available[, "Item"],
Description = available[, "Title"],
stringsAsFactors = FALSE
))
| Dataset | Description |
|---|---|
| cadab | Cached ADAB |
| cadae | Cached ADAE |
| cadaette | Cached ADAETTE |
| cadcm | Cached ADCM |
| caddv | Cached ADDV |
| cadeg | Cached ADEG |
| cadex | Cached ADEX |
| cadhy | Cached ADHY |
| cadlb | Cached ADLB |
| cadmh | Cached ADMH |
| cadpc | Cached ADPC |
| cadpp | Cached ADPP |
| cadqlqc | Cached ADQLQC |
| cadqs | Cached ADQS |
| cadrs | Cached ADRS |
| cadsl | Cached ADSL |
| cadsub | Cached ADSUB |
| cadtr | Cached ADTR |
| cadtte | Cached ADTTE |
| cadvs | Cached ADVS |
From detailed inspection we determine that this package ships with 20 cached datasets covering domains such as adverse events (ADAE), ECG (ADEG), exposure (ADEX), medical history (ADMH), pharmacokinetics (ADPC/ADPP), questionnaires (ADQS), tumour response (ADRS/ADTR), and more.
For this exploratory pass we select four that map naturally to the DM1 clinical context—as outlined in the consensus-based care recommendations for adults with DM1 (Ashizawa et al., 2018), which emphasise regular cardiac monitoring, hepatic and endocrine surveillance, and longitudinal safety trackinG:
| Dataset | CDISC domain | Why selected for DM1 | Key variables |
|---|---|---|---|
cadsl |
ADSL | Demographics & baseline—needed for any analysis and to check arm balance | AGE, SEX, RACE, ARM, BMRKR1/2 |
cadvs |
ADVS | Vital signs over time—cardiovascular monitoring is critical in DM1 (cardiac conduction defects) | SYSBP, DIABP, PULSE per visit |
cadlb |
ADLB | Lab biomarkers—hepatic (ALT) and inflammatory (CRP) surveillance relevant to DM1 therapies | ALT, CRP, IGA per visit |
cadaette |
ADAETTE | Time-to-adverse-event—the most information-rich safety comparison across arms | Time, event/censor, by arm |
data("cadsl"); adsl <- cadsl
data("cadvs"); advs <- cadvs
data("cadlb"); adlb <- cadlb
data("cadaette"); adaette <- cadaette
tibble(
Dataset = c("ADSL", "ADVS", "ADLB", "ADAETTE"),
Rows = c(nrow(adsl), nrow(advs), nrow(adlb), nrow(adaette)),
Columns = c(ncol(adsl), ncol(advs), ncol(adlb), ncol(adaette))
) %>% knitr::kable(align = "lrr")
| Dataset | Rows | Columns |
|---|---|---|
| ADSL | 400 | 55 |
| ADVS | 16800 | 87 |
| ADLB | 8400 | 102 |
| ADAETTE | 3600 | 66 |
Datasets not used here (e.g. ADAE for individual AE listings, ADEG for ECG intervals, ADQS for patient-reported outcomes) would become relevant in a full production analysis; the proposed system is designed to accommodate all of them.
Before any modelling we need to understand what each dataset contains and which columns are candidates for specific analytical tasks. The table below profiles every variable: its R class, the percentage of non-missing values, the number of distinct values, and a sample entry. This is the kind of metadata catalogue the proposed system would expose through an automated data-dictionary endpoint.
profile_vars <- function(df, label) {
tibble(
Dataset = label,
Variable = names(df),
Class = sapply(df, function(x) paste(class(x), collapse = "/")),
`Non-NA %` = sapply(df, function(x) sprintf("%.0f%%", 100 * mean(!is.na(x)))),
`Distinct` = sapply(df, n_distinct),
Example = sapply(df, function(x) {
v <- na.omit(x)
if (length(v) == 0) return("NA")
as.character(v[1])
})
)
}
bind_rows(
profile_vars(adsl, "ADSL"),
profile_vars(advs, "ADVS"),
profile_vars(adlb, "ADLB"),
profile_vars(adaette, "ADAETTE")
) %>%
knitr::kable()
| Dataset | Variable | Class | Non-NA % | Distinct | Example |
|---|---|---|---|---|---|
| ADSL | STUDYID | character | 100% | 1 | AB12345 |
| ADSL | USUBJID | character | 100% | 400 | AB12345-CHN-3-id-128 |
| ADSL | SUBJID | character | 100% | 400 | id-128 |
| ADSL | SITEID | character | 100% | 95 | CHN-3 |
| ADSL | AGE | integer | 100% | 38 | 32 |
| ADSL | AGEU | factor | 100% | 1 | YEARS |
| ADSL | SEX | factor | 100% | 2 | M |
| ADSL | RACE | factor | 100% | 6 | ASIAN |
| ADSL | ETHNIC | factor | 100% | 4 | HISPANIC OR LATINO |
| ADSL | COUNTRY | factor | 100% | 9 | CHN |
| ADSL | DTHFL | factor | 100% | 2 | Y |
| ADSL | INVID | character | 100% | 95 | INV ID CHN-3 |
| ADSL | INVNAM | character | 100% | 95 | Dr. CHN-3 Doe |
| ADSL | ARM | factor | 100% | 3 | A: Drug X |
| ADSL | ARMCD | factor | 100% | 3 | ARM A |
| ADSL | ACTARM | factor | 100% | 3 | A: Drug X |
| ADSL | ACTARMCD | factor | 100% | 3 | ARM A |
| ADSL | TRT01P | factor | 100% | 3 | A: Drug X |
| ADSL | TRT01A | factor | 100% | 3 | A: Drug X |
| ADSL | TRT02P | factor | 100% | 3 | B: Placebo |
| ADSL | TRT02A | factor | 100% | 3 | A: Drug X |
| ADSL | REGION1 | factor | 100% | 6 | Asia |
| ADSL | STRATA1 | factor | 100% | 3 | C |
| ADSL | STRATA2 | factor | 100% | 2 | S2 |
| ADSL | BMRKR1 | numeric | 100% | 400 | 14.424933692778 |
| ADSL | BMRKR2 | factor | 100% | 3 | MEDIUM |
| ADSL | ITTFL | factor | 100% | 1 | Y |
| ADSL | SAFFL | factor | 100% | 1 | Y |
| ADSL | BMEASIFL | factor | 100% | 2 | Y |
| ADSL | BEP01FL | factor | 100% | 2 | Y |
| ADSL | AEWITHFL | factor | 100% | 2 | N |
| ADSL | RANDDT | Date | 100% | 296 | 2019-02-22 |
| ADSL | TRTSDTM | POSIXct/POSIXt | 100% | 400 | 2019-02-24 11:09:25.683 |
| ADSL | TRTEDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-12 04:28:08.683 |
| ADSL | TRT01SDTM | POSIXct/POSIXt | 100% | 400 | 2019-02-24 11:09:25.683 |
| ADSL | TRT01EDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-11 22:28:08.683 |
| ADSL | TRT02SDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-11 22:28:08.683 |
| ADSL | TRT02EDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-12 04:28:08.683 |
| ADSL | AP01SDTM | POSIXct/POSIXt | 100% | 400 | 2019-02-24 11:09:25.683 |
| ADSL | AP01EDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-11 22:28:08.683 |
| ADSL | AP02SDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-11 22:28:08.683 |
| ADSL | AP02EDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-12 04:28:08.683 |
| ADSL | EOSSTT | factor | 100% | 3 | DISCONTINUED |
| ADSL | EOTSTT | factor | 100% | 3 | DISCONTINUED |
| ADSL | EOSDT | Date | 82% | 179 | 2022-02-12 |
| ADSL | EOSDY | integer | 82% | 114 | 1084 |
| ADSL | DCSREAS | factor | 30% | 8 | DEATH |
| ADSL | DTHDT | Date | 18% | 44 | 2022-03-06 |
| ADSL | DTHCAUS | factor | 18% | 8 | ADVERSE EVENT |
| ADSL | DTHCAT | factor | 18% | 4 | ADVERSE EVENT |
| ADSL | LDDTHELD | integer | 18% | 36 | 22 |
| ADSL | LDDTHGR1 | factor | 18% | 3 | <=30 |
| ADSL | LSTALVDT | Date | 82% | 219 | 2022-03-06 |
| ADSL | DTHADY | integer | 18% | 67 | 1105 |
| ADSL | ADTHAUT | factor | 14% | 3 | Yes |
| ADVS | STUDYID | character | 100% | 1 | AB12345 |
| ADVS | USUBJID | character | 100% | 400 | AB12345-BRA-1-id-105 |
| ADVS | SUBJID | character | 100% | 400 | id-105 |
| ADVS | SITEID | character | 100% | 95 | BRA-1 |
| ADVS | AGE | integer | 100% | 38 | 38 |
| ADVS | AGEU | factor | 100% | 1 | YEARS |
| ADVS | SEX | factor | 100% | 2 | M |
| ADVS | RACE | factor | 100% | 6 | BLACK OR AFRICAN AMERICAN |
| ADVS | ETHNIC | factor | 100% | 4 | HISPANIC OR LATINO |
| ADVS | COUNTRY | factor | 100% | 9 | BRA |
| ADVS | DTHFL | factor | 100% | 2 | N |
| ADVS | INVID | character | 100% | 95 | INV ID BRA-1 |
| ADVS | INVNAM | character | 100% | 95 | Dr. BRA-1 Doe |
| ADVS | ARM | factor | 100% | 3 | A: Drug X |
| ADVS | ARMCD | factor | 100% | 3 | ARM A |
| ADVS | ACTARM | factor | 100% | 3 | A: Drug X |
| ADVS | ACTARMCD | factor | 100% | 3 | ARM A |
| ADVS | TRT01P | factor | 100% | 3 | A: Drug X |
| ADVS | TRT01A | factor | 100% | 3 | A: Drug X |
| ADVS | TRT02P | factor | 100% | 3 | C: Combination |
| ADVS | TRT02A | factor | 100% | 3 | A: Drug X |
| ADVS | REGION1 | factor | 100% | 6 | South America |
| ADVS | STRATA1 | factor | 100% | 3 | B |
| ADVS | STRATA2 | factor | 100% | 2 | S1 |
| ADVS | BMRKR1 | numeric | 100% | 400 | 4.15691403407286 |
| ADVS | BMRKR2 | factor | 100% | 3 | MEDIUM |
| ADVS | ITTFL | factor | 100% | 1 | Y |
| ADVS | SAFFL | factor | 100% | 1 | Y |
| ADVS | BMEASIFL | factor | 100% | 2 | Y |
| ADVS | BEP01FL | factor | 100% | 2 | Y |
| ADVS | AEWITHFL | factor | 100% | 2 | N |
| ADVS | RANDDT | Date | 100% | 296 | 2020-03-08 |
| ADVS | TRTSDTM | POSIXct/POSIXt | 100% | 400 | 2020-03-08 05:39:28.683 |
| ADVS | TRTEDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-14 20:58:26.683 |
| ADVS | TRT01SDTM | POSIXct/POSIXt | 100% | 400 | 2020-03-08 05:39:28.683 |
| ADVS | TRT01EDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADVS | TRT02SDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADVS | TRT02EDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-14 20:58:26.683 |
| ADVS | AP01SDTM | POSIXct/POSIXt | 100% | 400 | 2020-03-08 05:39:28.683 |
| ADVS | AP01EDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADVS | AP02SDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADVS | AP02EDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-14 20:58:26.683 |
| ADVS | EOSSTT | factor | 100% | 3 | DISCONTINUED |
| ADVS | EOTSTT | factor | 100% | 3 | DISCONTINUED |
| ADVS | EOSDT | Date | 82% | 179 | 2022-02-14 |
| ADVS | EOSDY | integer | 82% | 114 | 709 |
| ADVS | DCSREAS | factor | 30% | 8 | PROTOCOL VIOLATION |
| ADVS | DTHDT | Date | 18% | 44 | 2022-03-16 |
| ADVS | DTHCAUS | factor | 18% | 8 | ADVERSE EVENT |
| ADVS | DTHCAT | factor | 18% | 4 | ADVERSE EVENT |
| ADVS | LDDTHELD | integer | 18% | 36 | 24 |
| ADVS | LDDTHGR1 | factor | 18% | 3 | <=30 |
| ADVS | LSTALVDT | Date | 82% | 219 | 2022-03-09 |
| ADVS | DTHADY | integer | 18% | 67 | 496 |
| ADVS | ADTHAUT | factor | 14% | 3 | Yes |
| ADVS | ASEQ | integer | 100% | 42 | 1 |
| ADVS | VSSEQ | integer | 100% | 42 | 1 |
| ADVS | VSTESTCD | factor | 100% | 6 | DIABP |
| ADVS | VSTEST | factor | 100% | 6 | Diastolic Blood Pressure |
| ADVS | VSCAT | factor | 100% | 1 | VITAL SIGNS |
| ADVS | VSSTRESC | character | 100% | 6 | <80 |
| ADVS | ASPID | integer | 100% | 16800 | 14596 |
| ADVS | PARAM | factor | 100% | 6 | Diastolic Blood Pressure |
| ADVS | PARAMCD | factor | 100% | 6 | DIABP |
| ADVS | AVAL | numeric | 100% | 16800 | 72.5958424490508 |
| ADVS | AVALU | factor | 100% | 5 | Pa |
| ADVS | BASE2 | numeric | 100% | 2400 | 72.5958424490508 |
| ADVS | BASE | numeric | 86% | 2401 | 113.744509468754 |
| ADVS | BASETYPE | factor | 100% | 1 | LAST |
| ADVS | ABLFL2 | factor | 100% | 2 | Y |
| ADVS | ABLFL | factor | 100% | 2 | |
| ADVS | CHG2 | numeric | 100% | 14401 | 0 |
| ADVS | PCHG2 | numeric | 100% | 14401 | 0 |
| ADVS | CHG | numeric | 86% | 12002 | 0 |
| ADVS | PCHG | numeric | 86% | 12002 | 0 |
| ADVS | DTYPE | factor | 0% | 1 | NA |
| ADVS | ANRIND | factor | 100% | 3 | LOW |
| ADVS | BNRIND | factor | 100% | 3 | NORMAL |
| ADVS | ADTM | POSIXct/POSIXt | 100% | 2800 | 2020-03-26 05:39:28.683 |
| ADVS | ADY | integer | 100% | 975 | 18 |
| ADVS | ATPTN | integer | 100% | 1 | 1 |
| ADVS | AVISIT | factor | 100% | 7 | SCREENING |
| ADVS | AVISITN | integer | 100% | 7 | -1 |
| ADVS | LOQFL | factor | 100% | 2 | Y |
| ADVS | ONTRTFL | factor | 100% | 2 | |
| ADVS | ANRLO | numeric | 100% | 6 | 80 |
| ADVS | ANRHI | numeric | 100% | 5 | 120 |
| ADLB | STUDYID | character | 100% | 1 | AB12345 |
| ADLB | USUBJID | character | 100% | 400 | AB12345-BRA-1-id-105 |
| ADLB | SUBJID | character | 100% | 400 | id-105 |
| ADLB | SITEID | character | 100% | 95 | BRA-1 |
| ADLB | AGE | integer | 100% | 38 | 38 |
| ADLB | AGEU | factor | 100% | 1 | YEARS |
| ADLB | SEX | factor | 100% | 2 | M |
| ADLB | RACE | factor | 100% | 6 | BLACK OR AFRICAN AMERICAN |
| ADLB | ETHNIC | factor | 100% | 4 | HISPANIC OR LATINO |
| ADLB | COUNTRY | factor | 100% | 9 | BRA |
| ADLB | DTHFL | factor | 100% | 2 | N |
| ADLB | INVID | character | 100% | 95 | INV ID BRA-1 |
| ADLB | INVNAM | character | 100% | 95 | Dr. BRA-1 Doe |
| ADLB | ARM | factor | 100% | 3 | A: Drug X |
| ADLB | ARMCD | factor | 100% | 3 | ARM A |
| ADLB | ACTARM | factor | 100% | 3 | A: Drug X |
| ADLB | ACTARMCD | factor | 100% | 3 | ARM A |
| ADLB | TRT01P | factor | 100% | 3 | A: Drug X |
| ADLB | TRT01A | factor | 100% | 3 | A: Drug X |
| ADLB | TRT02P | factor | 100% | 3 | C: Combination |
| ADLB | TRT02A | factor | 100% | 3 | A: Drug X |
| ADLB | REGION1 | factor | 100% | 6 | South America |
| ADLB | STRATA1 | factor | 100% | 3 | B |
| ADLB | STRATA2 | factor | 100% | 2 | S1 |
| ADLB | BMRKR1 | numeric | 100% | 400 | 4.15691403407286 |
| ADLB | BMRKR2 | factor | 100% | 3 | MEDIUM |
| ADLB | ITTFL | factor | 100% | 1 | Y |
| ADLB | SAFFL | factor | 100% | 1 | Y |
| ADLB | BMEASIFL | factor | 100% | 2 | Y |
| ADLB | BEP01FL | factor | 100% | 2 | Y |
| ADLB | AEWITHFL | factor | 100% | 2 | N |
| ADLB | RANDDT | Date | 100% | 296 | 2020-03-08 |
| ADLB | TRTSDTM | POSIXct/POSIXt | 100% | 400 | 2020-03-08 05:39:28.683 |
| ADLB | TRTEDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-14 20:58:26.683 |
| ADLB | TRT01SDTM | POSIXct/POSIXt | 100% | 400 | 2020-03-08 05:39:28.683 |
| ADLB | TRT01EDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADLB | TRT02SDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADLB | TRT02EDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-14 20:58:26.683 |
| ADLB | AP01SDTM | POSIXct/POSIXt | 100% | 400 | 2020-03-08 05:39:28.683 |
| ADLB | AP01EDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADLB | AP02SDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADLB | AP02EDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-14 20:58:26.683 |
| ADLB | EOSSTT | factor | 100% | 3 | DISCONTINUED |
| ADLB | EOTSTT | factor | 100% | 3 | DISCONTINUED |
| ADLB | EOSDT | Date | 82% | 179 | 2022-02-14 |
| ADLB | EOSDY | integer | 82% | 114 | 709 |
| ADLB | DCSREAS | factor | 30% | 8 | PROTOCOL VIOLATION |
| ADLB | DTHDT | Date | 18% | 44 | 2022-03-16 |
| ADLB | DTHCAUS | factor | 18% | 8 | ADVERSE EVENT |
| ADLB | DTHCAT | factor | 18% | 4 | ADVERSE EVENT |
| ADLB | LDDTHELD | integer | 18% | 36 | 24 |
| ADLB | LDDTHGR1 | factor | 18% | 3 | <=30 |
| ADLB | LSTALVDT | Date | 82% | 219 | 2022-03-09 |
| ADLB | DTHADY | integer | 18% | 67 | 496 |
| ADLB | ADTHAUT | factor | 14% | 3 | Yes |
| ADLB | ASEQ | integer | 100% | 21 | 1 |
| ADLB | LBSEQ | integer | 100% | 21 | 1 |
| ADLB | LBTESTCD | factor | 100% | 3 | ALT |
| ADLB | LBTEST | factor | 100% | 3 | Alanine Aminotransferase Measurement |
| ADLB | LBCAT | factor | 100% | 2 | CHEMISTRY |
| ADLB | LBSTRESC | character | 100% | 3 | <7 |
| ADLB | ASPID | integer | 100% | 8400 | 6364 |
| ADLB | PARAM | factor | 100% | 3 | Alanine Aminotransferase Measurement |
| ADLB | PARAMCD | factor | 100% | 3 | ALT |
| ADLB | AVAL | numeric | 100% | 8400 | 4.2979212245254 |
| ADLB | AVALU | factor | 100% | 3 | U/L |
| ADLB | BASE2 | numeric | 100% | 1200 | 4.2979212245254 |
| ADLB | BASE | numeric | 86% | 1201 | 24.695881839145 |
| ADLB | BASETYPE | factor | 100% | 1 | LAST |
| ADLB | ABLFL2 | factor | 100% | 2 | Y |
| ADLB | ABLFL | factor | 100% | 2 | |
| ADLB | CHG2 | numeric | 100% | 7201 | 0 |
| ADLB | PCHG2 | numeric | 100% | 7201 | 0 |
| ADLB | CHG | numeric | 86% | 6002 | 0 |
| ADLB | PCHG | numeric | 86% | 6002 | 0 |
| ADLB | DTYPE | logical | 0% | 1 | NA |
| ADLB | ANRIND | factor | 100% | 3 | LOW |
| ADLB | BNRIND | factor | 100% | 3 | NORMAL |
| ADLB | SHIFT1 | factor | 100% | 10 | |
| ADLB | ATOXGR | factor | 100% | 9 | -4 |
| ADLB | BTOXGR | factor | 100% | 9 | 0 |
| ADLB | ADTM | POSIXct/POSIXt | 100% | 2800 | 2020-05-27 05:39:28.683 |
| ADLB | ADY | integer | 100% | 976 | 80 |
| ADLB | ATPTN | integer | 100% | 1 | 1 |
| ADLB | AVISIT | factor | 100% | 7 | SCREENING |
| ADLB | AVISITN | integer | 100% | 7 | -1 |
| ADLB | LOQFL | factor | 100% | 2 | Y |
| ADLB | ONTRTFL | factor | 100% | 2 | |
| ADLB | WORS01FL | factor | 100% | 2 | |
| ADLB | WGRHIFL | factor | 100% | 2 | |
| ADLB | WGRLOFL | factor | 100% | 2 | |
| ADLB | WGRHIVFL | factor | 100% | 2 | |
| ADLB | WGRLOVFL | factor | 100% | 2 | |
| ADLB | ANL01FL | factor | 100% | 2 | |
| ADLB | ANRLO | numeric | 100% | 3 | 7 |
| ADLB | ANRHI | numeric | 100% | 3 | 55 |
| ADLB | BTOXGRL | factor | 100% | 6 | 0 |
| ADLB | BTOXGRH | factor | 100% | 6 | 0 |
| ADLB | ATOXGRL | factor | 100% | 6 | 4 |
| ADLB | ATOXGRH | factor | 100% | 6 | |
| ADLB | ATOXDSCL | character | 0% | 1 | NA |
| ADLB | ATOXDSCH | character | 100% | 3 | Alanine aminotransferase increased |
| ADAETTE | STUDYID | character | 100% | 1 | AB12345 |
| ADAETTE | USUBJID | character | 100% | 400 | AB12345-BRA-1-id-105 |
| ADAETTE | SUBJID | character | 100% | 400 | id-105 |
| ADAETTE | SITEID | character | 100% | 95 | BRA-1 |
| ADAETTE | AGE | integer | 100% | 38 | 38 |
| ADAETTE | AGEU | factor | 100% | 1 | YEARS |
| ADAETTE | SEX | factor | 100% | 2 | M |
| ADAETTE | RACE | factor | 100% | 6 | BLACK OR AFRICAN AMERICAN |
| ADAETTE | ETHNIC | factor | 100% | 4 | HISPANIC OR LATINO |
| ADAETTE | COUNTRY | factor | 100% | 9 | BRA |
| ADAETTE | DTHFL | factor | 100% | 2 | N |
| ADAETTE | INVID | character | 100% | 95 | INV ID BRA-1 |
| ADAETTE | INVNAM | character | 100% | 95 | Dr. BRA-1 Doe |
| ADAETTE | ARM | factor | 100% | 3 | A: Drug X |
| ADAETTE | ARMCD | factor | 100% | 3 | ARM A |
| ADAETTE | ACTARM | factor | 100% | 3 | A: Drug X |
| ADAETTE | ACTARMCD | factor | 100% | 3 | ARM A |
| ADAETTE | TRT01P | factor | 100% | 3 | A: Drug X |
| ADAETTE | TRT01A | factor | 100% | 3 | A: Drug X |
| ADAETTE | TRT02P | factor | 100% | 3 | C: Combination |
| ADAETTE | TRT02A | factor | 100% | 3 | A: Drug X |
| ADAETTE | REGION1 | factor | 100% | 6 | South America |
| ADAETTE | STRATA1 | factor | 100% | 3 | B |
| ADAETTE | STRATA2 | factor | 100% | 2 | S1 |
| ADAETTE | BMRKR1 | numeric | 100% | 400 | 4.15691403407286 |
| ADAETTE | BMRKR2 | factor | 100% | 3 | MEDIUM |
| ADAETTE | ITTFL | factor | 100% | 1 | Y |
| ADAETTE | SAFFL | factor | 100% | 1 | Y |
| ADAETTE | BMEASIFL | factor | 100% | 2 | Y |
| ADAETTE | BEP01FL | factor | 100% | 2 | Y |
| ADAETTE | AEWITHFL | factor | 100% | 2 | N |
| ADAETTE | RANDDT | Date | 100% | 296 | 2020-03-08 |
| ADAETTE | TRTSDTM | POSIXct/POSIXt | 100% | 400 | 2020-03-08 05:39:28.683 |
| ADAETTE | TRTEDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-14 20:58:26.683 |
| ADAETTE | TRT01SDTM | POSIXct/POSIXt | 100% | 400 | 2020-03-08 05:39:28.683 |
| ADAETTE | TRT01EDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADAETTE | TRT02SDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADAETTE | TRT02EDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-14 20:58:26.683 |
| ADAETTE | AP01SDTM | POSIXct/POSIXt | 100% | 400 | 2020-03-08 05:39:28.683 |
| ADAETTE | AP01EDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADAETTE | AP02SDTM | POSIXct/POSIXt | 82% | 328 | 2021-02-14 14:58:26.683 |
| ADAETTE | AP02EDTM | POSIXct/POSIXt | 82% | 328 | 2022-02-14 20:58:26.683 |
| ADAETTE | EOSSTT | factor | 100% | 3 | DISCONTINUED |
| ADAETTE | EOTSTT | factor | 100% | 3 | DISCONTINUED |
| ADAETTE | EOSDT | Date | 82% | 179 | 2022-02-14 |
| ADAETTE | EOSDY | integer | 82% | 114 | 709 |
| ADAETTE | DCSREAS | factor | 30% | 8 | PROTOCOL VIOLATION |
| ADAETTE | DTHDT | Date | 18% | 44 | 2022-03-16 |
| ADAETTE | DTHCAUS | factor | 18% | 8 | ADVERSE EVENT |
| ADAETTE | DTHCAT | factor | 18% | 4 | ADVERSE EVENT |
| ADAETTE | LDDTHELD | integer | 18% | 36 | 24 |
| ADAETTE | LDDTHGR1 | factor | 18% | 3 | <=30 |
| ADAETTE | LSTALVDT | Date | 82% | 219 | 2022-03-09 |
| ADAETTE | DTHADY | integer | 18% | 67 | 496 |
| ADAETTE | ADTHAUT | factor | 14% | 3 | Yes |
| ADAETTE | ASEQ | integer | 100% | 9 | 6 |
| ADAETTE | TTESEQ | integer | 100% | 9 | 6 |
| ADAETTE | PARAM | factor | 100% | 9 | Time to end of AE reporting period |
| ADAETTE | PARAMCD | factor | 100% | 9 | AEREPTTE |
| ADAETTE | AVAL | numeric | 100% | 1460 | 1.94113620807666 |
| ADAETTE | AVALU | factor | 67% | 3 | YEARS |
| ADAETTE | ADTM | POSIXct/POSIXt | 67% | 2155 | 2022-02-14 |
| ADAETTE | ADY | integer | 67% | 635 | 709 |
| ADAETTE | CNSR | integer | 67% | 3 | 0 |
| ADAETTE | EVNTDESC | character | 47% | 6 | Completion or Discontinuation |
| ADAETTE | CNSDTDSC | character | 53% | 7 |
The inventory helps us discriminate which variables to use for each analysis:
ARM, SEX, RACE,
PARAMCD) → grouping / stratification factors.AVAL,
CHG, BASE, AGE,
BMRKR1) → outcome or covariate in regressions, boxplots,
trajectories.TRTSDTM,
TRTEDTM, ADTM) → time-on-treatment, duration
calculations.CNSR,
EVNTDESC) → time-to-event (survival) modelling.AVISIT,
AVISITN) → longitudinal panel structure, repeated-measures
models.The cached data ships with 400 subjects. For the purpose of the intent, we are subsampling to match the intended scale. In a production system this step would not exist—the warehouse query would simply return the real cohort.
selected_ids <- adsl %>%
distinct(USUBJID) %>%
slice_sample(n = 200) %>%
pull(USUBJID)
adsl <- adsl %>% filter(USUBJID %in% selected_ids)
advs <- advs %>% filter(USUBJID %in% selected_ids)
adlb <- adlb %>% filter(USUBJID %in% selected_ids)
adaette <- adaette %>% filter(USUBJID %in% selected_ids)
After subsampling we have 200 patients across 63 sites and 3 treatment arms (A: Drug X, B: Placebo, C: Combination).
The first step in any clinical analysis is understanding who is in the study. In a multi-centre trial we need to verify that treatment arms are balanced with respect to key covariates—imbalances in age or sex could confound every downstream comparison. In DM1 specifically, age at onset correlates with CTG repeat length, so any distributional skew is a red flag.
We show two complementary views:
# Patient counts by arm and sex
demo_summary <- adsl %>%
count(ARM, SEX, name = "n_patients") %>%
mutate(ARM = fct_reorder(ARM, n_patients, .fun = sum))
p1_left <- ggplot(demo_summary, aes(x = ARM, y = n_patients, fill = SEX)) +
geom_col(position = "dodge", width = 0.7, colour = "grey30", linewidth = 0.3) +
scale_fill_manual(
values = c("F" = "#E07B91", "M" = "#6BAED6"),
labels = c("F" = "Female", "M" = "Male")
) +
labs(x = NULL, y = "Number of patients", fill = "Sex") +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 25, hjust = 1),
legend.position = "bottom")
# Age density by arm
p1_right <- ggplot(adsl, aes(x = AGE, fill = ARM, colour = ARM)) +
geom_density(alpha = 0.25, linewidth = 0.6) +
labs(x = "Age (years)", y = "Density",
fill = "Treatment arm", colour = "Treatment arm") +
theme_minimal(base_size = 11) +
theme(legend.position = "bottom")
if (is_html) {
subplot(
ggplotly(p1_left, tooltip = c("x", "y", "fill")),
ggplotly(p1_right, tooltip = c("x", "y", "fill")),
nrows = 1, shareY = FALSE, titleX = TRUE, titleY = TRUE, margin = 0.06
) %>%
layout(
title = list(text = "Demographic Overview by Treatment Arm", x = 0.5),
legend = list(orientation = "h", y = -0.15, x = 0.5, xanchor = "center"),
margin = list(t = 80, b = 80)
) %>%
add_plotly_config()
} else {
grid.arrange(p1_left, p1_right, ncol = 2)
}
Fig. 1. Demographic overview by treatment arm. Left: patient counts by arm and sex. Right: overlaid age density curves by treatment arm.
The three arms are roughly balanced in size and sex ratio. Age distributions overlap substantially, suggesting randomisation achieved its goal. A formal test (e.g. ANOVA on age, chi-square on sex) could confirm this, but for an exploratory pass the visual check is sufficient.
DM1 is a multi-systemic disease with well-documented cardiovascular involvement—cardiac conduction defects, arrhythmias, and in some cohorts altered blood pressure regulation. Monitoring systolic blood pressure (SBP) across scheduled study visits is therefore clinically meaningful.
We use a spaghetti-plus-mean design:
This layered approach lets the reader simultaneously assess heterogeneity and treatment-level patterns without suppressing the underlying data.
sbp <- advs %>%
filter(
PARAMCD == "SYSBP",
AVISIT != "",
!is.na(AVAL),
!is.na(AVISITN)
) %>%
select(USUBJID, ARM, AVISIT, AVISITN, AVAL)
# Summary statistics per arm per visit
sbp_summary <- sbp %>%
group_by(ARM, AVISIT, AVISITN) %>%
summarise(
mean_val = mean(AVAL, na.rm = TRUE),
se = sd(AVAL, na.rm = TRUE) / sqrt(n()),
n = n(),
.groups = "drop"
) %>%
mutate(
lo = mean_val - 1.96 * se,
hi = mean_val + 1.96 * se
)
p2_static <- ggplot() +
geom_line(
data = sbp,
aes(x = AVISITN, y = AVAL, group = USUBJID),
alpha = 0.08, linewidth = 0.3, colour = "grey50"
) +
geom_ribbon(
data = sbp_summary,
aes(x = AVISITN, ymin = lo, ymax = hi, fill = ARM),
alpha = 0.25
) +
geom_line(
data = sbp_summary,
aes(x = AVISITN, y = mean_val, colour = ARM),
linewidth = 0.9
) +
geom_point(
data = sbp_summary,
aes(x = AVISITN, y = mean_val, colour = ARM),
size = 1.8
) +
facet_wrap(~ ARM, nrow = 1) +
scale_x_continuous(breaks = sort(unique(sbp_summary$AVISITN))) +
labs(
x = "Analysis visit number",
y = "Systolic BP (Pa)",
colour = "Arm", fill = "Arm"
) +
theme_minimal(base_size = 11) +
theme(legend.position = "none",
strip.text = element_text(face = "bold"))
if (is_html) {
arm_colours <- setNames(
scales::hue_pal()(n_distinct(sbp_summary$ARM)),
levels(sbp_summary$ARM)
)
p2_ly <- plot_ly()
for (arm in unique(sbp_summary$ARM)) {
arm_sbp <- sbp %>% filter(ARM == arm)
arm_summ <- sbp_summary %>% filter(ARM == arm) %>% arrange(AVISITN)
# Individual traces (light, no legend entry)
for (uid in unique(arm_sbp$USUBJID)) {
d <- arm_sbp %>% filter(USUBJID == uid)
p2_ly <- p2_ly %>% add_lines(
data = d, x = ~AVISITN, y = ~AVAL,
line = list(color = "rgba(160,160,160,0.12)", width = 0.8),
hoverinfo = "text",
text = ~paste0("Patient: ", USUBJID, "<br>Visit: ", AVISIT,
"<br>SBP: ", round(AVAL, 1)),
showlegend = FALSE, legendgroup = arm
)
}
# 95% CI ribbon
p2_ly <- p2_ly %>%
add_ribbons(
data = arm_summ, x = ~AVISITN, ymin = ~lo, ymax = ~hi,
fillcolor = paste0(arm_colours[arm], "33"),
line = list(color = "transparent"),
showlegend = FALSE, legendgroup = arm
)
# Mean line + points
p2_ly <- p2_ly %>%
add_trace(
data = arm_summ, x = ~AVISITN, y = ~mean_val,
type = "scatter", mode = "lines+markers",
line = list(color = arm_colours[arm], width = 2.5),
marker = list(color = arm_colours[arm], size = 7),
name = arm, legendgroup = arm,
hoverinfo = "text",
text = ~paste0(arm, "<br>Visit: ", AVISIT,
"<br>Mean SBP: ", round(mean_val, 1),
"<br>95% CI: [", round(lo, 1), ", ", round(hi, 1), "]",
"<br>n = ", n)
)
}
p2_ly %>% layout(
title = list(text = "Systolic Blood Pressure Trajectories Over Visits"),
xaxis = list(title = "Analysis visit number"),
yaxis = list(title = "Systolic BP (Pa)"),
legend = list(orientation = "h", y = -0.15),
hovermode = "closest"
) %>%
add_plotly_config()
} else {
p2_static
}
Fig. 2. Systolic blood pressure trajectories over scheduled study visits. Individual patient traces (grey) are overlaid with group mean ± 95 % CI by treatment arm.
Mean SBP stays relatively stable across visits in all three arms, with wide individual variability (a realistic pattern for a heterogeneous disease). In a real DM1 dataset we would also overlay cardiac conduction metrics (PR interval, QTc) from the ECG domain—cross-modal exploration that the proposed system is designed to support.
In DM1, hepatic and inflammatory markers deserve close surveillance:
Showing change from baseline (CHG) rather than raw
values lets us focus on within-patient shifts—the natural
framing for a longitudinal study and for the materialised views we
proposed in the analysis system (the patient-level summary
view stores exactly this kind of derived quantity).
lab_chg <- adlb %>%
filter(
PARAMCD %in% c("ALT", "CRP"),
AVISIT != "",
!is.na(CHG),
!is.na(AVISITN)
) %>%
select(USUBJID, ARM, PARAMCD, PARAM, AVISIT, AVISITN, CHG) %>%
mutate(AVISIT = fct_reorder(AVISIT, AVISITN))
p3_static <- ggplot(lab_chg, aes(x = AVISIT, y = CHG, fill = ARM)) +
geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
geom_boxplot(
outlier.size = 0.7, outlier.alpha = 0.4,
linewidth = 0.35, width = 0.7,
position = position_dodge(width = 0.8)
) +
facet_wrap(~ PARAM, scales = "free_y", ncol = 1) +
labs(
x = "Scheduled visit",
y = "Change from baseline",
fill = "Treatment arm"
) +
theme_minimal(base_size = 11) +
theme(
axis.text.x = element_text(angle = 35, hjust = 1, size = 8),
legend.position = "top",
strip.text = element_text(face = "bold", size = 11)
)
if (is_html) {
params <- unique(lab_chg$PARAMCD)
p3_panels <- lapply(params, function(pc) {
d <- lab_chg %>% filter(PARAMCD == pc)
plot_ly(
data = d, x = ~AVISIT, y = ~CHG, color = ~ARM,
type = "box",
hoverinfo = "y+name",
showlegend = (pc == params[1])
) %>%
layout(
boxmode = "group",
annotations = list(
text = unique(d$PARAM), xref = "paper", yref = "paper",
x = 0.5, y = 1.06, showarrow = FALSE,
font = list(size = 14, face = "bold")
),
shapes = list(list(
type = "line", x0 = 0, x1 = 1, xref = "paper",
y0 = 0, y1 = 0, line = list(color = "grey", dash = "dash")
))
)
})
subplot(p3_panels, nrows = length(params), shareX = TRUE, titleY = TRUE) %>%
layout(
title = list(text = "Change from Baseline in ALT and CRP Over Visits"),
yaxis = list(title = "Change from baseline"),
yaxis2 = list(title = "Change from baseline"),
boxmode = "group",
legend = list(orientation = "h", y = -0.1),
margin = list(t = 70)
) %>%
add_plotly_config()
} else {
p3_static
}
Fig. 3. Change from baseline in ALT and CRP across study visits by treatment arm. The dashed line marks zero change; boxes show the inter-quartile range with whiskers extending to 1.5 × IQR.
Both ALT and CRP change distributions remain centred around zero with no obvious arm-level divergence—a reassuring safety signal. The inter-quartile ranges widen slightly at later visits, consistent with increasing variability as follow-up time grows and some patients are lost. In a real DM1 study, we would flag individual patients whose ALT exceeds 3\(\times\) ULN (Hy’s Law boundary) and join their data with the genomic modality to look for variant-level associations—exactly the cross-modal workflow the system is designed to enable.
Time-to-event analysis is the most information-rich way to compare safety profiles across treatment arms. The Kaplan-Meier estimator gives non-parametric survival curves; the log-rank test provides a formal between-arm comparison.
In the proposed analysis system, this would be exposed as a parameterised module: the researcher selects the event of interest (any AE, serious AE, grade 3–5 AE), stratification variables, and optional covariates via a configuration file, and the pipeline produces the curves and test results automatically.
# Available time-to-event endpoints
tte_params <- adaette %>% distinct(PARAMCD, PARAM)
knitr::kable(tte_params, col.names = c("Code", "Description"))
| Code | Description |
|---|---|
| AEREPTTE | Time to end of AE reporting period |
| AETOT1 | Number of occurrences of any adverse event |
| AETOT2 | Number of occurrences of any serious adverse event |
| AETOT3 | Number of occurrences of a grade 3-5 adverse event |
| AETTE1 | Time to first occurrence of any adverse event |
| AETTE2 | Time to first occurrence of any serious adverse event |
| AETTE3 | Time to first occurrence of a grade 3-5 adverse event |
| HYSTTEBL | Time to Hy’s Law Elevation in relation to Baseline |
| HYSTTEUL | Time to Hy’s Law Elevation in relation to ULN |
We select AETTE1 (time to first occurrence of any adverse event)—the broadest safety endpoint.
tte_data <- adaette %>%
filter(
PARAMCD == "AETTE1",
!is.na(AVAL),
!is.na(CNSR)
) %>%
select(USUBJID, ARM, AVAL, CNSR, PARAM) %>%
distinct(USUBJID, .keep_all = TRUE) %>%
mutate(
event = 1 - CNSR,
time_wks = AVAL / 7
)
event_label <- tte_data %>% pull(PARAM) %>% unique() %>% first()
# Kaplan-Meier fit
surv_obj <- Surv(time = tte_data$time_wks, event = tte_data$event)
km_fit <- survfit(surv_obj ~ ARM, data = tte_data)
# Log-rank test
lr_test <- survdiff(surv_obj ~ ARM, data = tte_data)
lr_pval <- pchisq(lr_test$chisq, df = length(lr_test$n) - 1, lower.tail = FALSE)
Events observed: 125 / 200 patients experienced the event. Log-rank p-value: 0.0446.
km_tidy <- tidy(km_fit) %>%
mutate(ARM = str_remove(strata, "^ARM="))
# Number-at-risk at evenly spaced time points
risk_times <- seq(0, max(km_tidy$time, na.rm = TRUE), length.out = 6) %>% round(1)
risk_summary <- summary(km_fit, times = risk_times)
risk_tbl <- tibble(
time = risk_summary$time,
n.risk = risk_summary$n.risk,
ARM = str_remove(risk_summary$strata, "^ARM=")
)
if (is_html) {
arm_colours <- setNames(
scales::hue_pal()(n_distinct(km_tidy$ARM)),
unique(km_tidy$ARM)
)
p4_ly <- plot_ly()
for (arm in unique(km_tidy$ARM)) {
d <- km_tidy %>% filter(ARM == arm) %>% arrange(time)
# CI ribbon
p4_ly <- p4_ly %>%
add_ribbons(
data = d, x = ~time, ymin = ~conf.low, ymax = ~conf.high,
fillcolor = paste0(arm_colours[arm], "22"),
line = list(color = "transparent"),
showlegend = FALSE, legendgroup = arm,
hoverinfo = "skip"
)
# Step curve
p4_ly <- p4_ly %>%
add_trace(
data = d, x = ~time, y = ~estimate,
type = "scatter", mode = "lines",
line = list(color = arm_colours[arm], width = 2.2, shape = "hv"),
name = arm, legendgroup = arm,
hoverinfo = "text",
text = ~paste0(
arm,
"<br>Time: ", round(time, 1), " weeks",
"<br>Event-free: ", scales::percent(estimate, accuracy = 0.1),
"<br>95% CI: [", scales::percent(conf.low, accuracy = 0.1),
", ", scales::percent(conf.high, accuracy = 0.1), "]",
"<br>n.risk: ", n.risk,
"<br>n.event: ", n.event
)
)
# Censor tick marks
censored <- d %>% filter(n.censor > 0)
if (nrow(censored) > 0) {
p4_ly <- p4_ly %>%
add_markers(
data = censored, x = ~time, y = ~estimate,
marker = list(symbol = "line-ns", size = 8,
color = arm_colours[arm], line = list(width = 1.5)),
showlegend = FALSE, legendgroup = arm,
hoverinfo = "text",
text = ~paste0("Censored (n=", n.censor, ")")
)
}
}
p4_ly %>% layout(
title = list(text = paste0(
"Kaplan-Meier: Time to First Adverse Event",
"<br><sup>Log-rank p = ", sprintf("%.4f", lr_pval), "</sup>"
)),
xaxis = list(title = "Time (weeks)"),
yaxis = list(title = "Event-free probability",
tickformat = ".0%", range = c(0, 1)),
legend = list(orientation = "h", y = -0.15),
hovermode = "closest",
margin = list(t = 80)
) %>%
add_plotly_config()
} else {
# Static fallback for github_document
p_km <- ggplot(km_tidy, aes(x = time, y = estimate, colour = ARM, fill = ARM)) +
geom_step(linewidth = 0.8) +
geom_rect(
aes(
xmin = time,
xmax = lead(time, default = max(time)),
ymin = conf.low, ymax = conf.high
),
alpha = 0.10, colour = NA
) +
annotate(
"text",
x = max(km_tidy$time) * 0.65, y = 0.15,
label = sprintf("Log-rank p = %.4f", lr_pval),
size = 3.5, fontface = "italic", colour = "grey30"
) +
scale_y_continuous(labels = percent_format(), limits = c(0, 1)) +
labs(
x = "Time (weeks)",
y = "Event-free probability",
colour = "Treatment arm",
fill = "Treatment arm"
) +
theme_minimal(base_size = 11) +
theme(legend.position = "top")
p_risk <- ggplot(risk_tbl, aes(x = time, y = ARM, label = n.risk)) +
geom_text(size = 3) +
labs(x = "Time (weeks)", y = NULL, title = "Number at risk") +
theme_minimal(base_size = 9) +
theme(
panel.grid = element_blank(),
plot.title = element_text(size = 9, face = "bold"),
axis.text.y = element_text(face = "bold")
)
grid.arrange(p_km, p_risk, ncol = 1, heights = c(4, 1))
}
Fig. 4. Kaplan–Meier curves for time to first adverse event by treatment arm, with 95 % confidence bands and censoring tick marks. The log-rank test p-value is annotated.
The Kaplan-Meier curves separate modestly between arms, with a log-rank p-value of 0.0446—nominally significant at the 0.05 level. In a real DM1 study this would warrant further investigation with a Cox proportional-hazards model adjusting for baseline covariates (age, sex, disease severity, CTG repeat length), and the result would be cross-referenced with the genomic and proteomic modalities to identify biological correlates of adverse event risk.
The four analyses above trace a deliberate progression:
| # | Analysis | Type | System capability it demonstrates |
|---|---|---|---|
| 1 | Demographic overview | Descriptive | Fast queries over the ADSL materialised view |
| 2 | SBP trajectories | Longitudinal / time | Cross-visit exploration of vital signs |
| 3 | ALT & CRP change from BL | Longitudinal / safety | Change-from-baseline monitoring for dashboards |
| 4 | KM time to first AE | Time-to-event / inferential | Parameterised survival module for notebooks |
In the proposed dynamic analysis system, each of these would be: